Achieving Domain Specificity in SMT without Overt Siloing
نویسندگان
چکیده
We examine pooling data as a method for improving Statistical Machine Translation (SMT) quality for narrowly defined domains, such as data for a particular company or public entity. By pooling all available data, building large SMT engines, and using domain-specific target language models, we see boosts in quality, and can achieve the generalizability and resiliency of a larger SMT but with the precision of a domain-specific engine.
منابع مشابه
Experiments on Domain Adaptation for English--Hindi SMT
Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which ...
متن کاملFunctional Siloing? Towards a Practical Understanding of Operational Boundaries Using Critical Systems Heuristics
The paper discusses the application of Critical Systems Heuristics to the problem of functional siloing. Functional siloing refers to a situation in which the functional areas of an organisation become overly focused on local performance measures to the detriment of the organisation as a whole. The authors liken the organisational fragmentation to Ulrich’s description of dysfunctional social pl...
متن کاملGrounding Imperatives to Actions is Not Enough: A Challenge for Grounded NLU for Robots from Human-Human Data
We present a proposal for a Natural Language Understanding method for simple pick-and-place robots which maps utterances to different levels in an action hierarchy. The hierarchy is a graph containing both lower-level action and higher-level goal levels. This attempts to overcome the surprising lack of overt imperative verb forms in natural task-oriented dialogue, which we show to be the case s...
متن کاملContext Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache
We report results from a domain adaptation task for statistical machine translation (SMT) using cache-based adaptive language and translation models. We apply an exponential decay factor and integrate the cache models in a standard phrasebased SMT decoder. Without the need for any domain-specific resources we obtain a 2.6% relative improvement on average in BLEU scores using our dynamic adaptat...
متن کاملEnabling Domain Experts to Model and Execute Tasks in Flexible Human-Robot Teams
Recent advances in safe human-robot coexistence make collaboration of humans and robots in achieving common goals feasible. We propose a concept that treats human and robot agents as equal partners in executing a task specified by a shared task model. Equality between agents offers high flexibility, as e.g. the team composition may change arbitrarily without interrupting the working progress. T...
متن کامل